The following object is masked from 'package:lme4':
lmer
The following object is masked from 'package:stats':
step
library(ggplot2)library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
Simulation for GLM(M)s
How many times have you sat with your data, googling, asking ChatGPT, pulling hair out trying to figure out which model to fit for your experiment/study? If you’re like me, it’s been a lot of times. How many times have you sat there thinking: Does this model even do what I think it’s doing? Again, if you’re like me, there have been many times.
Now, what if I told you there was a way to solve these kinds of problems, improve reproducibility and your statistics skills? That solution is simulation.
Example N: Biodiversity in food forests
There was a lively debate in our department recently about the correct way to analyse an experiment that examined how the food-forest land-use type affects biodiversity. Specifically, the researchers were interested in whether food forests had more biodiversity value than other land-use types (e.g. agricultural fields). The experiment was designed in the following way. There were 30 locations throughout the Netherlands and Flanders (Belgium). At each of these 30 locations, there were two plots from the two groups i.e. food forest and agricultural field (there were actually four groups but, for simplicity, we will use two for this example). Within each of plots at each of the 30 locations (i.e. \(30 * 2 = 60\) plots), ten samples were randomly taken. Therefore, within each of the two plots at a given location, 10 individual biodiversity measurements were taken (i.e. \(30 * 2 * 10 = 600\) measurements). For this analysis, we will view the agricultural field group as the control and the food forest group as the treatment but this could also reasonably be switched around. In this specific case, it does not matter.
Exercise: How would you analyse these data?
The resarcher proposed analysing these data using a linear mixed model (for simplicity, we will use a Gaussian error distribution but the same ideas apply to generalised linear models) with the following structure (where B - biodiversity, T - treatment and L - location):
\[
B \sim T + (1 | L)
\]
But is this correct way to analyse these data? Let’s think very carefully about what this model is assuming. Before we do this, we will write the model equation. Do not worry about this model equation for now. However, I wrote it here because we will want to come back to all the different parameters in this model.
So, what does this model assume? It assumes that the control plots (i.e. the agricultural fields) at all 30 locations (\(j\)) have some base-level of biodiversity (\(\alpha_j\)). These base-level biodiversity values (\(\alpha_j\)) are normally distributed with some global mean (\(\mu_{global}\)) and global standard deviation (\(\sigma_{global}\)). The model then assumes that the treatment has the same effect in all 30 locations (i.e. the difference between food forest and agricultural fields is the same across the 30 locations). Based on the base-level biodiversity values at the 30 locations (\(\alpha_j\)) and the treatment effect (\(\beta_{1}\)), the control (i.e. agricultural field) and treatment group (i.e. food forest) at each location have some mean value of biodiversity (\(\mu_{ij}\)). The observed biodiversity values (\(B_{ij}\)) are normally distributed around these mean values (\(\mu_{ij}\)) for each location-group combination with some residual standard deviation (\(\sigma_{residual}\)).
Now we are going to simulate from this model and, I promise, this will become clearer. We went painfully through all the parameters in the model because the idea behind using simulation is to simulate these assumptions with specific parameter values, fit the model and then see if our model reproduces these parameter values. If the model cannot reproduce the parameters, then it means that our assumptions about how the model works are probably wrong.
Simulation 1
To start, we will specify the global mean (\(\mu_{global}\)) and the global standard deviation (\(\sigma_{global}\)) of the normal distribution that describes the base-level of biodiversity at each location in the control group (i.e. the agricultural field). Let’s visualise this:
# set-up the global mean (mu_global = 100) and global standard deviation (sigma_global = 10)mu_global <-100sigma_global <-10# sample from this distribution and plot the resultsdist_global <-rnorm(n =100000, mean = mu_global, sd = sigma_global)ggplot(mapping =aes(x = dist_global)) +geom_density(fill ="grey", bw =2.5) +scale_x_continuous(expand =c(0, 0)) +scale_y_continuous(expand =c(0, 0)) +ylab("Density") +xlab("Biodiversity") +theme_minimal()
In this experiment, we have 30 different locations. Therefore, we need to sample 30 times from this global distribution. These 30 samples are then the 30 base-level biodiversity values for the agricultural field plot (i.e. control plot) at each location (\(\alpha_j\)). We can visualise these 30 location mean values as red vertical lines on the global distribution:
# set the number of locationslocations <-30# take 15 random samples from the distributiona_j <-rnorm(n = locations, mean = mu_global, sd = sigma_global)ggplot() +geom_density(mapping =aes(x = dist_global),fill ="grey", bw =2.5) +geom_vline(mapping =aes(xintercept = a_j), colour ="red", linewidth =0.25) +scale_x_continuous(expand =c(0, 0)) +scale_y_continuous(expand =c(0, 0)) +ylab("Density") +xlab("Biodiversity") +theme_minimal()
So, now we have our 30 base-level biodiversity values for agricultural fields at the 30 locations (\(\alpha_j\)). Let’s visualise this in a different way that is more intuitive. Specifically, we plot each mean value on the y-axis and the x-axis simply indexes the 30 locations. Moreover, we plot the global mean as a red, dashed, horizontal line.
# visualise the results as a location scatterplotggplot() +geom_point(mapping =aes(x =seq_along(a_j), y = a_j),colour ="black", size =2) +geom_hline(yintercept = mu_global, colour ="red", linetype ="dashed") +ylab("alpha j") +xlab("Location j") +theme_minimal()
The next thing that this model assumes is that the base-level biodiversity value in the control group (i.e. the agricultural field) is modified by the treatment (i.e. the food forest treatment group) (\(\beta_1\). Moreover, in this specific model formulation, the treatment effect is assumed to be constant across all 30 locations.
We’ll assume a treatment effect of 5 (i.e. \(\beta_1 = 5\)). This means that, on average, biodiversity in food forests will be greater than in the agricultural fields by 5 units (corresponds to 5%). We then add this treatment effect to all locations which gives us the mean biodiversity value at the 30 locations for the food forest.
# set the treatment effectb1 <-5# add the treatment effectmu_ij <- a_j + b1# add the control plots to this vectormu_ij <-c(a_j, mu_ij)# pull into a data.framedat_mu <- dplyr::tibble(location =rep(c(seq_len(locations)), 2),treatment =rep(c("agricultural: control", "food-forest: treatment"), each = locations),mu_ij = mu_ij)# visualise the results as a location scatterplotggplot(data = dat_mu,mapping =aes(x = location, y = mu_ij, colour = treatment)) +geom_point() +geom_hline(yintercept = mu_global, colour ="red", linetype ="dashed") +ylab("mu ij") +xlab("Location j") +theme_minimal()
So, now we have simulated the average biodiversity values in the control group (i.e. agricultural fields) and the average biodiversity values in the treatment group (i.e. food forests) as per the model (we hope). The last thing we need to do is simulate the observed biodiversity value i.e. the 10 replicates in each plot in each location (\(B_i\)).
To do this, we assume that the biodiversity values (\(B_i\)) are normally distributed around the means (\(\mu_{ij}\)) with some standard deviation (\(\sigma_{residual}\)) which we will set at 5:
# number of within-plot replicatesreplicates <-10# set the residual standard deviationsigma_residual <-5# simulate six replicates in each location-group combinationdat_list <-vector("list", length =nrow(dat_mu)) for (i inseq_along(dat_list)) {# extract the first location-group combination x <- dat_mu[i, ]# simulate 6 random draws using the mean and residual standard deviation y <-rnorm(n = replicates, mean = x$mu_ij, sd = sigma_residual)# add this to the data.frame z <- dplyr::tibble(location = x$location,treatment = x$treatment,B_i = y)# add this to an output list dat_list[[i]] <- z}# bind into a data.framedat <- dplyr::bind_rows(dat_list)
Now we have a fully simulated dataset which we can plot and which shows the simulated mean values as diamonds (\(\mu_{ij}\)) and the simulated observed biodiversity values as circles (\(B_i\)).
# visualise the results as a location scatterplotggplot() +geom_point(data = dat_mu,mapping =aes(x = location, y = mu_ij, colour = treatment), size =3.5, shape =18) +geom_point(data = dat,mapping =aes(x = location, y = B_i, colour = treatment),alpha =0.1) +geom_hline(yintercept = mu_global, colour ="red", linetype ="dashed") +ylab("mu ij") +xlab("Location j") +theme_minimal()
Using this simulated dataset, we can fit the linear mixed model using the lme4 package and see if we can reproduce the parameter values that we put into the model.
# convert the location variable to a characterdat$location <-as.character(dat$location)# fit the modellmm1 <- lmerTest::lmer(B_i ~ treatment + (1| location), data = dat)# extract the summarylmm1_sum <-summary(lmm1)# print the summaryprint(lmm1_sum)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: B_i ~ treatment + (1 | location)
Data: dat
REML criterion at convergence: 3763.4
Scaled residuals:
Min 1Q Median 3Q Max
-3.2448 -0.6778 -0.0029 0.6501 2.4232
Random effects:
Groups Name Variance Std.Dev.
location (Intercept) 108.55 10.419
Residual 25.01 5.001
Number of obs: 600, groups: location, 30
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 99.2125 1.9240 29.6642 51.57 <2e-16
treatmentfood-forest: treatment 5.3733 0.4083 569.0000 13.16 <2e-16
(Intercept) ***
treatmentfood-forest: treatment ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
trtmntfd-:t -0.106
Now that we have fit the model, we can see if we can reproduce the parameters that we put into the model. Let’s start with the global mean (\(\mu_{global}\)) and global standard deviation (\(\sigma_{global}\)).
We set the global mean (\(\mu_{global}\)) to 100. Therefore, we should obtain a value close to 100 from our fitted model. To extract the mean, we need the intercept coefficient in this model. In this model, the estimated global mean was indeed very close to 100 (see below):
# extract the global mean from the model: intercept termprint(paste0("Estimated global mean: ", lmm1_sum$coefficients[1,1]))
[1] "Estimated global mean: 99.2124914240701"
What about the global standard deviation (\(\sigma_{global}\)) which we set at 10? Indeed, the estimated global standard deviation is also very close to the 10 that we simulated:
# extract the random effectslmm1_re <-VarCorr(lmm1)# the relevant standard deviation is the location standard deviationx <-attr(lmm1_re$location, "stddev")names(x) <-NULL# extract the global mean from the model: intercept termprint(paste0("Estimated global standard deviation: ", x))
[1] "Estimated global standard deviation: 10.4185866414567"
What about the treatment effect (\(B_1\))? We simulated a treatment effect of 5. Does the model reproduce this? Yes, it does.
# extract the treatment effect from the modelprint(paste0("Estimated treatment effect: ", lmm1_sum$coefficients[2,1]))
And, what about the residual variation which we set at 5? This also matches very well with our simulation:
# extract the global mean from the model: intercept termprint(paste0("Estimated global standard deviation: ", lmm1_sum$sigma))
[1] "Estimated global standard deviation: 5.00115486214523"
Finally, the model estimates a mean value for each location (\(\alpha_j\)). We can also check if these mean values match with simulated the mean values for each site and again, the model reproduces these values very well (the red, dashed line is the 1:1 line).
# extract the model coefficientslmm1_coef <-coef(lmm1)lmm1_coef <- lmm1_coef$location# get the rowname as a nummeric variable and sortlmm1_coef$location <-as.numeric(row.names(lmm1_coef))lmm1_coef <- dplyr::arrange(lmm1_coef, location)# how do these values correlate with the simulated valuesggplot() +geom_point(mapping =aes(x = a_j, y = lmm1_coef$`(Intercept)`)) +geom_abline(intercept =0, slope =1, linetype ="dashed", colour ="red") +xlab("Modelled values (a_j)") +ylab("Simulated values") +theme_minimal()
Hopefully, this exercise has convinced you that we understand the behaviour of the model because we were able to directly simulate it and the model reproduced all the simulated parameters with high levels of accuracy. Also, it is worth going back and increasing the replication in this simulation using the locations and replicates objects. If you increase these, the estimates should get better and better.
But, now that we understand the assumptions and behaviour, we may ask ourselves are these assumptions reasonable?
Exercise: Are there any assumptions about how this model works that you do not think make much sense?
Simulation 2
In my view, the assumption that the treatment effect is the same across the different sites is problematic for this kind of study. No food forest is the same and no agricultural field is the same. Moreover, the locations cover a huge area with different environmental conditions which means that the biodiversity difference between a food forest and a grassland might be very different. By only fitting a single treatment effect parameter, the model is ignoring all of that variation and this can cause the significance of the effect to be inflated (Barr et al. 2013; Gelman and Brown 2024). Rather, what we need to do, is model the potential variation in treatment effects among the locations. By doing this, we will obtain the average treatment effect across all 15 locations by directly modelling variation among locations.
So what is different about this model compared to the previous model? Well, the only thing that differs is that we are now fitting a separate intercept and treatment effect for each location rather than simply a separate intercept for each location and a single treatment effect. Moreover, we will assume that the intercepts and treatment effects are correlated. This just means that the model allows for the possibility that, for example, a high intercept is typically associated with a high (or low) treatment effect. As previously, I’m going to write the model equation out and then we will explore the assumptions of this model when we perform the simulation.
So, what does this model assume? It assumes that the control plots (i.e. the agricultural fields) at all locations (\(j\)) have some base-level of biodiversity (\(\alpha_j\)). This the same as the previous model. The model then assumes that the treatment (i.e. the difference between the food forest and the agricultural fields) differs across the locations such that there is a separate treatment effect at each location (\(\beta_{j}\)). Therefore, based on the base-level biodiversity values at the locations (\(\alpha_j\)) and the treatment effects at the locations (\(\beta_{j}\)), the control (i.e. agricultural field) and treatment group (i.e. food forest) at each location have some mean value of biodiversity (\(\mu_{ij}\)). Importantly, the \(\alpha_j\) and \(\beta_j\) values come from a common multivariate normal distribution. The observed biodiversity values (\(B_{ij}\)) are normally distributed around these mean values (\(\mu_{ij}\)) for each location-group combination with some residual standard deviation (\(\sigma_{residual}\)).
To start with, we need to define the parameters of the multivariate normal distribution. As with the previous simulation, we will assume that the global mean across locations is 100 (\(\bar{\alpha_j}\)) and that the standard deviation is 10 (variance is 100), (\(\tau_{\alpha}^2\)). Next, we assume that average treatment effect is 5 (\(\bar{\beta_j}\)) and that the standard deviation is 3 (variance = 9) (\(\tau_{\beta}^2\)). Next, we assume that the \(\alpha_j\) and \(\beta_j\) values are correlated across locations with a correlation coefficient of 0.3 (\(\rho\)). Using these simulated values, we can draw pairs of \(\alpha_j\) and \(\beta_j\) values for each location from a multivariate normal distribution.
# sim global mean (alpha_bar = 100) and variance (tau_alpha = 10)alpha_bar <-100tau_alpha <-10# sim global treatment effect mean (beta_bar = 5) and standard deviation (tau_beta = 2.5)beta_bar <-5tau_beta <-3# set the correlation coefficientrho <-0.3# build the covariance matrixcov_mat <-matrix(data =c((tau_alpha^2), rho*tau_alpha*tau_beta, rho*tau_alpha*tau_beta, (tau_beta^2)), 2, 2)# simulate from the multivariate normaldist_global <- MASS::mvrnorm(n =20000,mu =c(alpha_bar, beta_bar),Sigma = cov_mat)# plot this multivariate normal distributiondist_global <- dplyr::as_tibble(as.data.frame(dist_global))names(dist_global) <-c("a_j", "b_j")ggplot(data = dist_global,mapping =aes(x = a_j, y = b_j)) +geom_point() +ylab("beta_j") +xlab("alpha_j") +theme_minimal()
This figure shows 20000 samples from the multivariate normal distribution that we set-up for \(\alpha_j\) and \(\beta_j\). As you can see, there values are correlated such that information from \(\alpha_j\) can provide information about \(\beta_j\). As this is the general distribution, we need to sample 30 locations from this which we overlay as red dots where each point represents both an \(\alpha_j\) and \(\beta_j\) for each location.
# set the number of locationslocations <-30# simulate from the multivariate normaldist_samp <- MASS::mvrnorm(n = locations,mu =c(alpha_bar, beta_bar),Sigma = cov_mat)# plot this multivariate normal distributiondist_samp <- dplyr::as_tibble(as.data.frame(dist_samp))names(dist_samp) <-c("a_j", "b_j")ggplot() +geom_point(data = dist_global,mapping =aes(x = a_j, y = b_j)) +geom_point(data = dist_samp,mapping =aes(x = a_j, y = b_j), colour ="red", size =3.5) +ylab("beta_j") +xlab("alpha_j") +theme_minimal()
Using these sampled values of \(\alpha_j\) and \(\beta_j\), we can now build a dataset with the means of each control and treatment plot at each location (\(\mu_{ij}\)). Let’s visualise this.
# build a data.frame with the simulated meansdat_mu <- dplyr::tibble(location =rep(c(seq_len(locations)), 2),treatment =rep(c("agricultural: control", "food-forest: treatment"),each = locations),mu_ij =c(dist_samp$a_j, (dist_samp$a_j + dist_samp$b_j)))head(dat_mu)
# A tibble: 6 × 3
location treatment mu_ij
<int> <chr> <dbl>
1 1 agricultural: control 105.
2 2 agricultural: control 95.5
3 3 agricultural: control 112.
4 4 agricultural: control 101.
5 5 agricultural: control 93.1
6 6 agricultural: control 89.0
# visualise the results as a location scatterplotggplot(data = dat_mu,mapping =aes(x = location, y = mu_ij, colour = treatment)) +geom_point() +ylab("mu ij") +xlab("Location j") +theme_minimal()
As with the previous model, now that we have the simulated the average biodiversity values in the control group (i.e. agricultural fields) and the average biodiversity values in the treatment group (i.e. food forests), the last thing we need to do is simulate the observed biodiversity value i.e. the 10 replicates in each plot in each location (\(B_i\)).
To do this, we assume that the biodiversity values (\(B_i\)) are normally distributed around the means (\(\mu_{ij}\)) with some standard deviation (\(\sigma_{residual}\)) which we will set at 5:
# number of within-plot replicatesreplicates <-10# set the residual standard deviationsigma_residual <-5# simulate six replicates in each location-group combinationdat_list <-vector("list", length =nrow(dat_mu)) for (i inseq_along(dat_list)) {# extract the first location-group combination x <- dat_mu[i, ]# simulate 6 random draws using the mean and residual standard deviation y <-rnorm(n = replicates, mean = x$mu_ij, sd = sigma_residual)# add this to the data.frame z <- dplyr::tibble(location = x$location,treatment = x$treatment,B_i = y)# add this to an output list dat_list[[i]] <- z}# bind into a data.framedat <- dplyr::bind_rows(dat_list)
Now we have a fully simulated dataset which we can plot and which shows the simulated mean values as diamonds (\(\mu_{ij}\)) and the simulated observed biodiversity values as circles (\(B_i\)). As you can see, the difference between the biodiversity in food forests and agricultural fields differs between locations.
# visualise the results as a location scatterplotggplot() +geom_point(data = dat_mu,mapping =aes(x = location, y = mu_ij, colour = treatment), size =3.5, shape =18) +geom_point(data = dat,mapping =aes(x = location, y = B_i, colour = treatment),alpha =0.1) +geom_hline(yintercept = mu_global, colour ="red", linetype ="dashed") +ylab("mu ij") +xlab("Location j") +theme_minimal()
Now, as with the first model, we can test whether the model that we fit was able to reproduce the parameters that we put into the model. So, let’s fit the model:
# fit the model with correlated random intercepts and slopeslmm2 <- lmerTest::lmer(B_i ~ treatment + (1+ treatment | location), data = dat)# get the summarylmm2_sum <-summary(lmm2)# extract the variance-covariance valueslmm2_varcov <-VarCorr(lmm2)lmm2_varcov <- lmm2_varcov$location# print the summarylmm2_sum
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: B_i ~ treatment + (1 + treatment | location)
Data: dat
REML criterion at convergence: 3799.5
Scaled residuals:
Min 1Q Median 3Q Max
-3.0377 -0.6286 -0.0346 0.6185 3.2558
Random effects:
Groups Name Variance Std.Dev. Corr
location (Intercept) 109.86 10.481
treatmentfood-forest: treatment 10.21 3.196 0.18
Residual 25.17 5.017
Number of obs: 600, groups: location, 30
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 99.3417 1.9354 28.9995 51.329 < 2e-16
treatmentfood-forest: treatment 5.1003 0.7129 28.9997 7.154 7.12e-08
(Intercept) ***
treatmentfood-forest: treatment ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
trtmntfd-:t 0.089
Now, lets see if we can recover the parameters of the multivariate normal distribution that we simulated. We assumed that the global mean across locations was 100 (\(\bar{\alpha_j}\)) and that the standard deviation is 10 (variance is 100), (\(\tau_{\alpha}^2\)). The model was very close to both of these:
# check the alpha_bar estimate from the modelprint(paste0("Estimated global mean (alpha_bar): ", coefficients(lmm2_sum)[1, 1]))
[1] "Estimated global mean (alpha_bar): 99.3416520567536"
# check the alpha_bar estimate from the modelprint(paste0("Estimated global sd (tau_alpha): ", attr(lmm2_varcov, "stddev")[1]))
[1] "Estimated global sd (tau_alpha): 10.481191465"
We assumed that the average treatment effect was 5 (\(\bar{\beta_j}\)) and that the standard deviation was 3 (variance = 9) (\(\tau_{\beta}^2\)) and the model came very close to this:
# check the beta_bar estimate from the modelprint(paste0("Estimated global mean effect (beta_bar): ", coefficients(lmm2_sum)[2, 1]))
[1] "Estimated global mean effect (beta_bar): 5.10029410306472"
# check the alpha_bar estimate from the modelprint(paste0("Estimated global sd (tau_beta): ", attr(lmm2_varcov, "stddev")[2]))
[1] "Estimated global sd (tau_beta): 3.19602684906842"
Moreover, we assumed that these two random variables (\(\alpha_j\) and \(\beta_j\)) had a correlation of 0.30.
# check the alpha_bar estimate from the modelprint(paste0("Estimated correlation (rho): ", attr(lmm2_varcov, "correlation")[1, 2]))
In addition, we set the residual standard deviation to 5 (\(\sigma_{\text{residual}}\)). This also matches very well with our model
# extract the global mean from the model: intercept termprint(paste0("Estimated residual standard deviation: ", lmm2_sum$sigma))
[1] "Estimated residual standard deviation: 5.01659613606519"
Finally, the model estimates a mean value for each location (\(\alpha_j\)) and a treatment effect for each location (\(\beta_j\)). We can then see how well the model was able to reproduce these specific values. In general, the model does reasonably well although the treatment effects (\(\beta_j\)) could be better.
# extract the model coefficientslmm2_coef <-coef(lmm2)$location# get the rowname as a nummeric variable and sortlmm2_coef$location <-as.numeric(row.names(lmm2_coef))lmm2_coef <- dplyr::arrange(lmm2_coef, location)# how do these values correlate with the simulated values: a_jp1 <-ggplot() +geom_point(mapping =aes(x = dist_samp$a_j, y = lmm2_coef[[1]])) +geom_abline(intercept =0, slope =1, linetype ="dashed", colour ="red") +xlab("Modelled values (a_j)") +ylab("Simulated values") +theme_minimal()# how do these values correlate with the simulated values: b_jp2 <-ggplot() +geom_point(mapping =aes(x = dist_samp$b_j, y = lmm2_coef[[2]])) +geom_abline(intercept =0, slope =1, linetype ="dashed", colour ="red") +xlab("Modelled values (b_j)") +ylab("Simulated values") +theme_minimal()# combine and plotcowplot::plot_grid(p1, p2, nrow =1, ncol =2)
So, now we have done the simulation again but with the more realistic assumption that the difference in biodiversity between food forests and agricultural fields can vary across locations. Moreover, we have shown that the fitted model could reproduce the simulated parameters. Again, if we were to increase our replication (i.e. locations and replicates), we should get closer and closer to the true simulated values.
What was the point of these two simulations?
The reason we went through these two simulations was to help us think carefully about what we are assuming about the data-generating processes that produced our data when we fit different kinds of linear models. Specifically, I argued that, in this case, the assumption that the treatment effect (i.e. difference between food forests and agricultural fields in biodiversity) does not vary between locations is problematic and that, allowing the treatment effect to vary between locations is much more realistic. Now what I want to do is show you what would happen if our true data-generating process actually follows simulation 2 (i.e. treatment effect varies between locations) but we fit the model from simulation 1. I think this happens very frequently in ecological work.
So, what we are going to do is simulate the data based on Simulation 2 (i.e. allowing the treatment effect to vary between locations). We are then going to fit both models and see what happens to our inference over many replicate simmulations. To do this, I have wrapped the code for simulation 2 into a function that will allow us to do this. The code is very similar but is now more streamlined and allows one to easily run the simulation with different parameter values.
# sim_lmm: fit lmm1 and lmm2 to simulated datasim_lmm <-function(alpha_bar =100, tau_alpha =10, beta_bar =5, tau_beta =3, rho =0.5, locations =30, replicates =10, sigma_residual =5) {# Load required librariesif (!requireNamespace("dplyr", quietly =TRUE)) stop("Package 'dplyr' is required.")if (!requireNamespace("lmerTest", quietly =TRUE)) stop("Package 'lmerTest' is required.")if (!requireNamespace("MASS", quietly =TRUE)) stop("Package 'MASS' is required.")# Build the covariance matrix cov_mat <-matrix(c(tau_alpha^2, rho * tau_alpha * tau_beta, rho * tau_alpha * tau_beta, tau_beta^2), nrow =2, ncol =2)# Simulate from the multivariate normal distribution dist_samp <- MASS::mvrnorm(n = locations, mu =c(alpha_bar, beta_bar), Sigma = cov_mat)# Convert to tibble and name columns dist_samp <- dplyr::as_tibble(as.data.frame(dist_samp))names(dist_samp) <-c("a_j", "b_j")# Build the data frame with simulated means dat_mu <- dplyr::tibble(location =rep(seq_len(locations), 2),treatment =rep(c("control", "treatment"), each = locations),mu_ij =c(dist_samp$a_j, dist_samp$a_j + dist_samp$b_j) )# Simulate replicates for each location-treatment combination dat <- dat_mu |> dplyr::rowwise() |> dplyr::mutate(B_i =list(rnorm(n = replicates, mean = mu_ij, sd = sigma_residual)) ) |> dplyr::select(location, treatment, B_i) |> tidyr::unnest(cols =c(B_i))# Convert location variable to character dat$location <-as.character(dat$location)# Fit the linear mixed model with a random intercept only lmm1 <- lmerTest::lmer(B_i ~ treatment + (1| location), data = dat)# Fit the linear mixed model with correlated random intercepts and slopes lmm2 <- lmerTest::lmer(B_i ~ treatment + (1+ treatment | location), data = dat)# Return the simulated data and the fitted modellist(model1 = lmm1,model2 = lmm2)}
So, let’s see how this function works. We will run a single replicate simulation (i.e. simulation 2 where treatment effects vary across locations) and fit both models (model1 and model2 respectively). This should convince you that the models are similar to what we have already done. Moreover, feel free to adjust some of the parameters and see what happens. For example, if you increase the number of locations (i.e. locations) and the number of replicates within each location (i.e. replicates), you should get estimates that are closer to the true, simulated parameter values.
# run one replicate of the second simulationsim2_obj <-sim_lmm(alpha_bar =100, tau_alpha =10, beta_bar =5, tau_beta =3, rho =0.3, locations =30, replicates =10, sigma_residual =5)summary(sim2_obj$model1)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: B_i ~ treatment + (1 | location)
Data: dat
REML criterion at convergence: 3839.5
Scaled residuals:
Min 1Q Median 3Q Max
-2.69030 -0.69398 0.04414 0.65313 2.72465
Random effects:
Groups Name Variance Std.Dev.
location (Intercept) 161.00 12.689
Residual 28.03 5.294
Number of obs: 600, groups: location, 30
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 97.2839 2.3367 29.5025 41.633 <2e-16 ***
treatmenttreatment 3.7421 0.4323 569.0000 8.657 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
trtmnttrtmn -0.092
summary(sim2_obj$model2)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: B_i ~ treatment + (1 + treatment | location)
Data: dat
REML criterion at convergence: 3815.1
Scaled residuals:
Min 1Q Median 3Q Max
-2.49797 -0.64916 0.02877 0.61466 2.72487
Random effects:
Groups Name Variance Std.Dev. Corr
location (Intercept) 146.739 12.114
treatmenttreatment 9.418 3.069 0.32
Residual 25.626 5.062
Number of obs: 600, groups: location, 30
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 97.2839 2.2309 29.0000 43.608 < 2e-16 ***
treatmenttreatment 3.7421 0.6963 29.0005 5.375 8.96e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
trtmnttrtmn 0.203
Comparison of the two models
Now, let’s simulate 1000 replicates of simulation 2 and compare the resulting t-values for the treatment effect for models 1 and 2.
sim_reps <-vector("list", length =1000)for (i inseq_along(sim_reps)) {# run one replicate of the second simulation (i.e. correlated random intercepts and slopes) sim_obj <-sim_lmm(alpha_bar =100, tau_alpha =10, beta_bar =5, tau_beta =3, rho =0.3, locations =30, replicates =10, sigma_residual =5)# extract the model objects sim1_sum <-summary(sim_obj$model1) sim2_sum <-summary(sim_obj$model2)# extract the t-values for the treatment sim1_t <-coefficients(sim1_sum)[2, 4] sim2_t <-coefficients(sim2_sum)[2, 4]# extract the p-value for the treatment sim1_p <-coefficients(sim1_sum)[2, 5] sim2_p <-coefficients(sim2_sum)[2, 5]# bind into a data.frame sim_reps[[i]] <- dplyr::tibble(rep = i,sim =c(1, 2),t_val =c(sim1_t, sim2_t),p_val =c(sim1_p, sim2_p))}
boundary (singular) fit: see help('isSingular')
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.00330407 (tol = 0.002, component 1)
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.00354497 (tol = 0.002, component 1)
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.00311764 (tol = 0.002, component 1)
boundary (singular) fit: see help('isSingular')
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.00226977 (tol = 0.002, component 1)
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
unable to evaluate scaled gradient
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge: degenerate Hessian with 1 negative eigenvalues
Warning: Model failed to converge with 1 negative eigenvalue: -8.1e+00
# bind into a data.framesim_rep_df <- dplyr::bind_rows(sim_reps)
Now, we can compare the t-values between the two models on the same simulated data where the treatment effect varied between locations. What we see is that the t-values are consistently higher in model 1. This is because the model is ignoring the variation among locations. Model 2 is less certain about the treatment effect because it knows that there is variation. Indeed, the fact that the t-value for the treatment effect in model 1 is extremely high (on average 12) should raise some alarm bells.
# compare the t-values between the two simulationsggplot(data = sim_rep_df,mapping =aes(x = t_val, fill =as.character(sim))) +geom_density(alpha =0.5, bw =0.7) +theme_minimal()
In my view, the only time when model 1 would be appropriate is if there is only one replicate per site in which case the experiment becomes a randomised block design. The model would then not be able to properly estimate the varying slope effects across locations. The other case is if there is no variation among locations in the treatment effect. Let’s do this by simply setting the slope variation to zero. What happens in this case, is that the more complex model 2 becomes very similar to model 1. Moreover, the model will probably complain because of some numerical problems (e.g. boundary fit singular). This occurs because the model is trying to estimate the variation in beta but it is simply not there.
sim_reps <-vector("list", length =1000)for (i inseq_along(sim_reps)) {# run one replicate of the second simulation (i.e. correlated random intercepts and slopes) sim_obj <-sim_lmm(alpha_bar =100, tau_alpha =10, beta_bar =5, tau_beta =0, rho =0, locations =30, replicates =10, sigma_residual =5)# extract the model objects sim1_sum <-summary(sim_obj$model1) sim2_sum <-summary(sim_obj$model2)# extract the t-values for the treatment sim1_t <-coefficients(sim1_sum)[2, 4] sim2_t <-coefficients(sim2_sum)[2, 4]# extract the p-value for the treatment sim1_p <-coefficients(sim1_sum)[2, 5] sim2_p <-coefficients(sim2_sum)[2, 5]# bind into a data.frame sim_reps[[i]] <- dplyr::tibble(rep = i,sim =c(1, 2),t_val =c(sim1_t, sim2_t),p_val =c(sim1_p, sim2_p))}
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
unable to evaluate scaled gradient
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge: degenerate Hessian with 1 negative eigenvalues
Warning: Model failed to converge with 1 negative eigenvalue: -1.4e+01
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.00203503 (tol = 0.002, component 1)
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
Warning: Model failed to converge with 1 negative eigenvalue: -5.1e-01
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.00276737 (tol = 0.002, component 1)
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.00317306 (tol = 0.002, component 1)
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
unable to evaluate scaled gradient
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge: degenerate Hessian with 1 negative eigenvalues
Warning: Model failed to converge with 1 negative eigenvalue: -1.6e+01
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
unable to evaluate scaled gradient
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge: degenerate Hessian with 1 negative eigenvalues
Warning: Model failed to converge with 1 negative eigenvalue: -8.5e-01
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
Warning: Model failed to converge with 1 negative eigenvalue: -1.1e+01
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
Warning: Model failed to converge with 1 negative eigenvalue: -4.8e+00
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
unable to evaluate scaled gradient
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge: degenerate Hessian with 1 negative eigenvalues
Warning: Model failed to converge with 1 negative eigenvalue: -2.7e+01
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.00911966 (tol = 0.002, component 1)
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
unable to evaluate scaled gradient
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge: degenerate Hessian with 1 negative eigenvalues
Warning: Model failed to converge with 1 negative eigenvalue: -1.3e+01
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
# bind into a data.framesim_rep_df <- dplyr::bind_rows(sim_reps)
# compare the t-values between the two simulationsggplot(data = sim_rep_df,mapping =aes(x = t_val, fill =as.character(sim))) +geom_density(alpha =0.5, bw =0.7) +theme_minimal()
All of this is to say that, in this case, these simulations indicate that using model 2 is a more appropriate analysis. The only time that model 1 would become more appropriate is if there was only a single replicate per site or if there was very little variation in the treatment effect among locations. This is almost certainly not the case given this kind of field experiment. Moreover, even if this did turn out to be the case, it would still be better to try and model the variation in treatment effects and, only if the model was unable to fit, to use the simpler model.
This post was inspired by two great papers that helped me understand this varying intercept/varying slope distinction which are both well-worth reading. They are talking from a psychology perspective but similar problems apply to the ecological literature:
Gelman and Brown (2024): http://www.stat.columbia.edu/~gelman/research/published/healing3.pdf
Barr et al. (2013): https://www.sciencedirect.com/science/article/pii/S0749596X12001180
Bonus round…
In this case, I made the simplifying assumption that the biodiversity variable was continuous and normally distributed. However, very few measures of biodiversity are actually like this… Take species richness, for example, this is probably more appropriate modelled as a Poisson random variable. We are now going to repeat Simulation 2 but using a Poisson generalised linear mixed model to show you what this looks like.
Like previously, we are going to write the equation. As you can, the model is very similar. The only difference is that the observed biodiversity (\(B_{ij}\)) is Poisson distributed around some (\(\lambda_{ij}\)). Moreover, the \(\alpha_{j}\) and \(\beta_{j}\) parameters determine the \(log(\lambda_{ij}\) value. This ensures that the \(\lambda_{ij}\) values are positive and that the linear model goes from negative infinity to positive infinity.
Given the similarity in the equation, the simulation is very similar as well.
# set the mean species richness value across locationsalpha_bar =log(100)# set the variation in species richness across locationstau_alpha =log(100)/10print(tau_alpha)
[1] 0.460517
# set the treatment effectbeta_bar =log(5)# set the variation in the treatment effecttau_beta =log(5)/2# set the correlation between beta_j and alpha_jrho =0.5# set the number of locationslocations =1000# set the number of replicates within each site (two sites per location)replicates =50# Build the covariance matrixcov_mat <-matrix(c(tau_alpha^2, rho * tau_alpha * tau_beta, rho * tau_alpha * tau_beta, tau_beta^2), nrow =2, ncol =2)# Simulate from the multivariate normal distributiondist_samp <- MASS::mvrnorm(n = locations, mu =c(alpha_bar, beta_bar), Sigma = cov_mat)# Convert to tibble and name columnsdist_samp <- dplyr::as_tibble(as.data.frame(dist_samp))names(dist_samp) <-c("a_j", "b_j")# Build the data frame with simulated meansdat_mu <- dplyr::tibble(location =rep(seq_len(locations), 2),treatment =rep(c("control", "treatment"), each = locations),mu_ij =c(dist_samp$a_j, dist_samp$a_j + dist_samp$b_j) )
But, things change when we have to simulated the observed biodiversity values. For this, we take the exponentiated \(\mu_{ij}\) value and pass it through the Poisson distribution to generate the observed biodiversity values (\(B_{ij}\)).
# Simulate species richness replicates for each location-treatment combinationdat <- dat_mu |> dplyr::rowwise() |> dplyr::mutate(B_i =list(rpois(n = replicates, lambda =exp(mu_ij))) ) |> dplyr::select(location, treatment, B_i) |> tidyr::unnest(cols =c(B_i))# Convert location variable to characterdat$location <-as.character(dat$location)# Fit the linear mixed model with correlated random intercepts and slopesglmm1 <- lme4::glmer(B_i ~ treatment + (1+ treatment | location), family=poisson(link = log), data = dat)# check the objectsummary(glmm1)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: poisson ( log )
Formula: B_i ~ treatment + (1 + treatment | location)
Data: dat
AIC BIC logLik deviance df.resid
842358.2 842405.7 -421174.1 842348.2 99995
Scaled residuals:
Min 1Q Median 3Q Max
-4.0347 -0.6746 -0.0110 0.6630 4.3132
Random effects:
Groups Name Variance Std.Dev. Corr
location (Intercept) 0.2199 0.4689
treatmenttreatment 0.6561 0.8100 0.51
Number of obs: 100000, groups: location, 1000
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 4.60293 0.01483 310.30 <2e-16 ***
treatmenttreatment 1.63148 0.02561 63.69 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
trtmnttrtmn 0.512